MAP Inference as Linear Integer Programming

MAP inference is a task concerned with finding the assignment of random variables in a Bayesian network that optimizes the joint probability over all possible assignments. By "optimizes", we are talking about maximizing the joint probability (or minimizing the sum of the negative log-probabilities). Whenever the term "optimize" is thrown around, it should be clear that the problem lends itself to mathematical programming. MAP inference is no different - it can be solved as an integer linear program.

Let's see how we could do this. First, we will start by loading a very simple Bayesian network with two nodes "A" and "B", with only one edge pointing from "A" -> "B"


In [3]:
from pyBN import *

In [4]:
bn = read_bn('data/simple.bn')

In [8]:
print bn.V
print bn.E
print bn.F


['A', 'B']
{'A': ['B'], 'B': []}
{'A': {'values': ['No', 'Yes'], 'parents': [], 'cpt': [0.3, 0.7]}, 'B': {'values': ['No', 'Yes'], 'parents': ['A'], 'cpt': [0.4, 0.6, 0.1, 0.9]}}

From a quick look at the conditional probability values, it should be clear that the assignment of values to "A" and "B" has a maximal probability of 0.63 when "A=Yes" and "B=Yes".

Let's see how this problem can be solved and generalized using a well-known Python optimization library called "Pulp". Pulp is just almost as good as CPLEX and Gurobi, but doesn't require a license and can be installed directly from pip.


In [9]:
from pulp import *

We start by creating a model:


In [99]:
model = LpProblem("MAP Inference", LpMinimize)

We will now create a variable for each value in the Conditional Probability Table of each variable. These are the decision variables - which entry in the CPT to choose.


In [100]:
x1 = LpVariable("A=No",0,1,LpInteger)
x2 = LpVariable("A=Yes",0,1,LpInteger)
x3 = LpVariable("B=No|A=No",0,1,LpInteger)
x4 = LpVariable("B=Yes|A=No",0,1,LpInteger)
x5 = LpVariable("B=No|A=Yes",0,1,LpInteger)
x6 = LpVariable("B=Yes|A=Yes",0,1,LpInteger)

In [101]:
var_list = [x1,x2,x3,x4,x5,x6]
val_list = -np.log(bn.flat_cpt())

As you can see, the number of variables equals the number of conditional probability values which must be specified. This number grows quickly as the number of nodes increases - but not as quickly as if another representation besides Bayesian networks was used.

First, we will add the objective value - minimizing the sum of the negative log values of the conditional probability entries:


In [102]:
model += np.dot(var_list,val_list)

Now, let's add in our constraints to make sure that the conditional probability tables agree on the parent values. That is, the decision variable value of a given node must equal the sum of the decision variable values of all children nodes which include the parent node's value.


In [103]:
model += x1 == x3+x4, 'Edge Constraint 1'
model += x2 == x5+x6, 'Edge Constraint 2'

Moreover, let's add constraint to make sure only one value from each CPT can be chosen:


In [104]:
model += x1 + x2 == 1, 'RV Constraint 1'
model += x3 + x4 + x5 + x6 == 1, 'RV Constraint 2'

Finally, let's solve.


In [105]:
model.solve()


Out[105]:
1

We can now print out the variables and observe the solution:


In [106]:
for v in model.variables():
    print v.name, v.varValue


A=No 0.0
A=Yes 1.0
B=No|A=No 0.0
B=No|A=Yes 0.0
B=Yes|A=No 0.0
B=Yes|A=Yes 1.0

And there it is - the solver shows us that the optimal solution indeed is "A=Yes" and "B=Yes|A=Yes"